Learning Visual Context for Group Activity Recognition

نویسندگان

چکیده

Group activity recognition aims to recognize an overall in a multi-person scene. Previous methods strive reason on individual features. However, they under-explore the person-specific contextual information, which is significant and informative computer vision tasks. In this paper, we propose new reasoning paradigm incorporate global information. Specifically, two modules bridge gap between group visual context. The first Transformer based Context Encoding (TCE) module, enhances representation by encoding information features refining aggregated second Spatial-Temporal Bilinear Pooling (STBiP) module. It firstly further explores pairwise relationships for context encoded representation, then generates semantic representations via gated message passing constructed spatial-temporal graph. On their basis, design two-branch model that integrates designed into pipeline. Systematic experiments demonstrate each module's effectiveness either branch. Visualizations indicate cues can be globally TCE. Moreover, our method achieves state-of-the-art results widely used benchmarks using only RGB images as input 2D backbones.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Temporal Context for Activity Recognition

Abstract. We present a method that allows to improve activity recognition using temporal and spatial context. We investigate how incremental learning of long-term human activity patterns improves the accuracy of activity classification over time. Two datasets collected over several months containing hand-annotated activity in residential and office environments were chosen to evaluate the appro...

متن کامل

Machine learning based Visual Evoked Potential (VEP) Signals Recognition

Introduction: Visual evoked potentials contain certain diagnostic information which have proved to be of importance in the visual systems functional integrity. Due to substantial decrease of amplitude in extra macular stimulation in commonly used pattern VEPs, differentiating normal and abnormal signals can prove to be quite an obstacle. Due to developments of use of machine l...

متن کامل

Iterative context compilation for visual object recognition

This contribution describes an almost parameterless iterative context compilation method, which produces feature layers, that are especially suited for mixed bottom-up top-down association architectures. The context model is simple and enables fast calculation. Resulting structures are invariant to position, scale and rotation of input patterns.

متن کامل

Dialogue Context for Visual Feedback Recognition

Head pose and gesture offer several key conversational grounding cues and are used extensively in face-to-face interaction among people. When recognizing visual feedback, people use more than their visual perception. Knowledge about the current topic and expectations from previous utterances help guide our visual perception in recognizing nonverbal cues. In this chapter, we investigate how dial...

متن کامل

Visual Learning for Landmark Recognition

Recognizing landmark is a critical task for mobile robots. Landmarks are used for robot positioning, and for building maps of unknown environments. In this context, the traditional recognition techniques based on strong geometric models cannot be used. Rather, models of landmarks must be built from observations using image-based visual learning techniques. Beyond its application to mobile robot...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i4.16437